首页> 外文OA文献 >Revisiting distance-based record linkage for privacy-preserving release of statistical datasets
【2h】

Revisiting distance-based record linkage for privacy-preserving release of statistical datasets

机译:重新审视基于距离的记录链接,以保护统计数据集的隐私

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Statistical Disclosure Control (SDC, for short) studies the problem of privacy-preserving data publishing in cases where the data is expected to be used for statistical analysis. An original dataset T containing sensitive information is transformed into a sanitized version T' which is released to the public. Both utility and privacy aspects are very important in this setting. For utility, T' must allow data miners or statisticians to obtain similar results to those which would have been obtained from the original dataset T. For privacy, T' must significantly reduce the ability of an adversary to infer sensitive information on the data subjects in T. One of the main a-posteriori measures that the SDC community has considered up to now when analyzing the privacy offered by a given protection method is the Distance-Based Record Linkage (DBRL) risk measure. In this work, we argue that the classical DBRL risk measure is insufficient. For this reason, we introduce the novel Global Distance-Based Record Linkage (GDBRL) risk measure. We claim that this new measure must be evaluated alongside the classical DBRL measure in order to better assess the risk in publishing T' instead of T. After that, we describe how this new measure can be computed by the data owner and discuss the scalability of those computations. We conclude by extensive experimentation where we compare the risk assessments offered by our novel measure as well as by the classical one, using well-known SDC protection methods. Those experiments validate our hypothesis that the GDBRL risk measure issues, in many cases, higher risk assessments than the classical DBRL measure. In other words, relying solely on the classical DBRL measure for risk assessment might be misleading, as the true risk may be in fact higher. Hence, we strongly recommend that the SDC community considers the new GDBRL risk measure as an additional measure when analyzing the privacy offered by SDC protection algorithms.
机译:统计披露控制(SDC)研究在预期将数据用于统计分析的情况下保护隐私的数据发布的问题。将包含敏感信息的原始数据集T转换为已发布版本的净化版本T'。在这种情况下,实用程序和隐私方面都非常重要。对于实用程序,T'必须允许数据挖掘者或统计学家获得与从原始数据集T获得的结果相似的结果。出于隐私考虑,T'必须大大降低对手推断有关数据主体的敏感信息的能力。 T. SDC社区到目前为止在分析给定保护方法提供的隐私时考虑的主要后验措施之一是基于距离的记录链接(DBRL)风险度量。在这项工作中,我们认为经典的DBRL风险度量是不够的。因此,我们引入了新颖的基于全球距离的记录链接(GDBRL)风险度量。我们声称必须与传统的DBRL度量一起评估此新度量,以便更好地评估发布T'而不是T的风险。此后,我们描述数据所有者如何计算此新度量,并讨论这些计算。我们通过广泛的实验得出结论,在该实验中,我们使用众所周知的SDC保护方法比较了我们的新颖措施和经典措施所提供的风险评估。这些实验证实了我们的假设,即在许多情况下,GDBRL风险度量比传统DBRL度量具有更高的风险评估。换句话说,仅依靠经典的DBRL度量进行风险评估可能会产生误导,因为实际风险可能实际上更高。因此,我们强烈建议SDC社区在分析SDC保护算法提供的隐私时,将新的GDBRL风险度量作为一种附加度量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号